Search CORE

173 research outputs found

Solving All-k-Nearest Neighbor Problem without an Index

Author: Chávez Edgar
Ludueña Verónica
Reyes Nora Susana
Publication venue
Publication date: 10/03/2020
Field of study

Among the similarity queries in metric spaces, there are one that obtains the k-nearest neighbors of all the elements in the database (All-k-NN). One way to solve it is the naïve one: comparing each object in the database with all the other ones and returning the k elements nearest to it (k-NN). Another way to do this is by preprocessing the database to build an index, and then searching on this index for the k-NN of each element of the dataset. Answering to the All-k-NN problem allows to build the k-Nearest Neighbor graph (kNNG). Given an object collection of a metric space, the Nearest Neighbor Graph (NNG) associates each node with its closest neighbor under the given metric. If we link each object to their k nearest neighbors, we obtain the k Nearest Neighbor Graph (kNNG).The kNNG can be considered an index for a database, which is quite efficient and can allow improvements. In this work, we propose a new technique to solve the All-k-NN problem which do not use any index to obtain the k-NN of each element. This approach solves the problem avoiding as many comparisons as possible, only comparing some database elements and taking advantage of the distance function properties. Its total cost is significantly lower than that of the naïve solution.XVI Workshop Bases de Datos y Minería de Datos.Red de Universidades con Carreras en Informátic

Servicio de Difusión de la Creación Intelectual

An unbalanced approach to metric space searching

Author: Chávez Edgar
Ludueña Verónica
Reyes Nora Susana
Publication venue
Publication date: 01/05/2005
Field of study

Proximity queries (the searching problem generalized beyond exact match) is mostly modeled as metric space. A metric space consists of a collection of objects and a distance function defined among them. The goal is to preprocess the data set (a slow procedure) to quickly answer proximity queries. This problem have received a lot of attention recently, specially in the pattern recognition community. The Excluded Middle Vantage Point Forest (VP–forest) is a data structure designed to search in high dimensional vector spaces. A VP–forest is built as a collection of balanced Vantage Point Trees (VP–trees). In this work we propose a novel two-fold approach for searching. Firstly we extend the VP– forest to search in metric spaces, and more importantly we test a counterintuitive modification to the VP–tree, namely to unbalance it. In exact searching an unbalanced data structure perform poorly, and most of the algorithmic effort is directed to obtain a balanced data structure. The unbalancing approach is motivated by a recent data structure (the List of Clusters ) specialized in high dimensional metric space searches, which is an extremely unbalanced data structure (a linked list) outperforming other approaches.Eje: AlgoritmosRed de Universidades con Carreras en Informática (RedUNCI

Fully dynamic and memory-adaptative spatial approximation trees

Author: Arroyuelo Diego
Navarro Gonzalo
Reyes Nora Susana
Publication venue
Publication date: 01/10/2003
Field of study

Hybrid dynamic spatial approximation trees are recently proposed data structures for searching in metric spaces, based on combining the concepts of spatial approximation and pivot based algorithms. These data structures are hybrid schemes, with the full features of dynamic spatial approximation trees and able of using the available memory to improve the query time. It has been shown that they compare favorably against alternative data structures in spaces of medium difficulty. In this paper we complete and improve hybrid dynamic spatial approximation trees, by presenting a new search alternative, an algorithm to remove objects from the tree, and an improved way of managing the available memory. The result is a fully dynamic and optimized data structure for similarity searching in metric spaces.Eje: Teoría (TEOR)Red de Universidades con Carreras en Informática (RedUNCI

A hybrid data structure for searching in metric spaces

Author: Chávez Edgar
Herrera Norma Edith
Reyes Nora Susana
Publication venue
Publication date: 01/05/2004
Field of study

The concept of “approximate” searching has applications in a vast number of fields. Some examples are non-traditional databases (e. g. storing images, fingerprints or audio clips, where the concept of exact search is of no use and we search instead for similar objects), text searching, information retrieval, machine learning and classification, image quantization and compression, computational biology, and function prediction.Eje: Base de datosRed de Universidades con Carreras en Informática (RedUNCI

Optimizing the spatial approximation tree from the root

Author: Gómez Alejandro J.
Ludueña Verónica
Reyes Nora Susana
Publication venue
Publication date: 01/07/2008
Field of study

Many computational applications need to look for information in a database. Nowadays, the predominance of nonconventional databases makes the similarity search (i.e., searching elements of the database that are "similar" to a given query) becomes a preponderant concept. The Spatial Approximation Tree has been shown that it compares favorably against alternative data structures for similarity searching in metric spaces of medium to high dimensionality ("difficult" spaces) or queries with low selectivity. However, for the construction process the tree root has been randomly selected and the tree ,in its shape and performance, is completely determined by this selection. Therefore, we are interested in improve mainly the searches in this data structure trying to select the tree root so to reflect some of the own characteristics of the metric space to be indexed. We regard that selecting the root in this way it allows a better adaption of the data structure to the intrinsic dimensionality of the metric space considered, so also it achieves more efficient similarity searches.Facultad de Informátic

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Servicio de Difusión de la Creación Intelectual

Optimizing the spatial approximation tree from the root

Author: Gómez Alejandro J.
Ludueña Verónica
Reyes Nora Susana
Publication venue
Publication date: 01/07/2008
Field of study

Búsquedas por similitud en PostgreSQL

Author: Kasián Fernando
Reyes Nora Susana
Publication venue
Publication date: 01/10/2012
Field of study

Las búsquedas en espacios métricos y los operadores para búsquedas por similitud han sido estudiados y son actualmente material de estudio recurrente debido al auge de datos no convencionales como por ejemplo audio o video disponibles en grandes repositorios de datos. Por lo tanto, surge la necesidad de almacenar y posteriormente consultar dichos datos. A pesar de ello no se encuentran gestores de bases de datos que implementen todos los operadores relevantes sobre datos de estas características, en los cuales tiene mayor sentido la búsqueda por similitud. Así, nuestro trabajo propone desarrollar un gestor de bases de datos, conteniendo datos no estructurados y que sea capaz de responder las operaciones por similitud más comunes sobre estos tipos de datos, basándonos para ello en: PostgreSQL.Eje: Workshop Bases de datos y minería de datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Approximate Nearest Neighbor Graph via Index Construction

Author: Chávez Edgar
Kasián Fernando
Ludueña Verónica
Reyes Nora Susana
Publication venue
Publication date: 01/10/2016
Field of study

Given a collection of objects in a metric space, the Nearest Neighbor Graph (NNG) associate each node with its closest neighbor under the given metric. It can be obtained trivially by computing the nearest neighbor of every object. To avoid computing every distance pair an index could be constructed. Unfortunately, due to the curse of dimensionality the indexed and the brute force methods are almost equally inefficient. This bring the attention to algorithms computing approximate versions of NNG. The DiSAT is a proximity searching tree. It is hierarchical. The root computes the distances to all objects, and each child node of the root computes the distance to all its subtree recursively. Top levels will have accurate computation of the nearest neighbor, and as we descend the tree this information would be less accurate. If we perform a few rebuilds of the index, taking deep nodes in each iteration, keeping score of the closest known neighbor, it is possible to compute an Approximate NNG (ANNG). Accordingly, in this work we propose to obtain de ANNG by this approach, without performing any search, and we tested this proposal in both synthetic and real world databases with good results both in costs and response quality.XIII Workshop Bases de datos y Minería de Datos (WBDMD).Red de Universidades con Carreras en Informática (RedUNCI

Approximate Nearest Neighbor Graph via Index Construction

Author: Chávez Edgar
Kasián Fernando
Ludueña Verónica
Reyes Nora Susana
Publication venue
Publication date: 16/11/2016
Field of study

List of Clustered Permutations in Secondary Memory

Author: Figueroa Karina
Paredes Rodrigo
Reyes Nora Susana
Roggero Patricia
Publication venue
Publication date: 01/10/2015
Field of study

Similarity search is a difficult problem and various indexing schemas have been defined to process similarity queries efficiently in many applications, including multimedia databases and other repositories handling complex objects. Metric indices support efficient similarity searches, but most of them are designed for main memory. Thus, they can handle only small datasets, suffering serious performance degradations when the objects reside on disk.Most real-life database applications require indices able to work on secondary memory. Among a plethora of indices, the List of Clustered Permutations (LCP) has shown to be competitive in main memory, since groups the permutations and establishes a criterion to discard whole clusters according the permutation of their centers. We introduce a secondary-memory variant of the LCP, which maintains the low number of distance evaluations when comparing the permutations themselves, and also needs a low number of I/O operations at construction and searching.XII Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual